Optimal Best Arm Identification with Fixed Confidence

نویسندگان

  • Aurélien Garivier
  • Emilie Kaufmann
چکیده

We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the ‘Track-and-Stop’ strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Best-Arm Identification in Linear Bandits

We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter θ and the objective is to return the arm with the largest reward. We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In parti...

متن کامل

Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem

We consider the problem of best arm identification with a fixed budget T , in theK-armed stochastic bandit setting, with arms distribution defined on [0, 1]. We prove that any bandit strategy, for at least one bandit problem characterized by a complexityH , will misidentify the best arm with probability lower bounded by exp ( − T log(K)H ) , whereH is the sum for all sub-optimal arms of the inv...

متن کامل

Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence

We consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution. We (1) introduce a PAC-like framework within which to derive and cast results; (2) derive a sample complexity lower bound for near-optimal arm identification; (3) propose an algorithm that identifies a nearl...

متن کامل

Pure Exploration in Episodic Fixed-Horizon Markov Decision Processes

Multi-Armed Bandit (MAB) problems can be naturally extended to Markov Decision Processes (MDP). We extend the Best Arm Identification problem to episodic fixed-horizon MDPs. Here, the goal of an agent interacting with the MDP is to reach a high confidence on the optimal policy in as few episodes as possible. We propose Posterior Sampling for Pure Exploration (PSPE), a Bayesian algorithm for pur...

متن کامل

Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies

Pandemic influenza has the epidemic potential to kill millions of people. While various preventive measures exist (i.a., vaccination and school closures), deciding on strategies that lead to their most effective and efficient use, remains challenging. To this end, individual-based epidemiological models are essential to assist decision makers in determining the best strategy to curve epidemic s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016